Research Paper: ALICE: An Algorithm to Extract Abbreviations from MEDLINE
نویسندگان
چکیده
OBJECTIVE To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly. METHODS ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. RESULTS It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. CONCLUSION ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.
منابع مشابه
ALICE: An Algorithm to Extract Abbreviations from MEDLINE
Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. Conclusion: ALICE ext...
متن کاملResearch Paper: Creating an Online Dictionary of Abbreviations from MEDLINE
OBJECTIVE The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbrevi...
متن کاملA Method to Retrieve Papers from MEDLINE: PETER System
We attempted to eliminate non-relevant papers from results of PubMed searches for each topic. The system is called PETER (PubMed Enhancer Toward Efficient Research) and it works as follows. 1. get LocusLink IDs manually. 2. collect information of gene names (AKA synonyms) from public databases. 3. make synonym variations automatically. 4. search papers by PubMed with each synonym. 5. extract ti...
متن کاملMining Terminological Knowledge in Large Biomedical Corpora
Terminological knowledge of the biomedical domain is important for natural language processing (NLP) and information retrieval (IR) applications, and a number of terminological knowledge sources, such as LocusLink, GeneBank, and the UMLS, already exist. However, because of the tremendous amount of research activity in the field, new terms and symbols are continually being created, many of which...
متن کاملA New Text Mining Approach for Finding Protein-to-Disease Associations
Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 12 5 شماره
صفحات -
تاریخ انتشار 2005